CAN THE ADAPTIVE METROPOLIS ALGORITHM COLLAPSE 
WITHOUT THE COVARIANCE LOWER BOUND? 



MATTI VIHOLA 

Abstract. The Adaptive Metropolis (AM) algorithm is based on the symmetric 
random-walk Metropolis algorithm. The proposal distribution has the following 
time-dependent covariance matrix at step n + 1 

5„ = Cov(Xi,...,X„) + e/, 

that is, the sample covariance matrix of the history of the chain plus a (small) con- 
stant e > multiple of the identity matrix /. The lower bound on the eigenvalues of 
Sn induced by the factor el is theoretically convenient, but practically cumbersome, 
as a good value for the parameter e may not always be easy to choose. This article 
considers variants of the AM algorithm that do not explicitly bound the eigenvalues 
of Sn away from zero. The behaviour of Sn is studied in detail, indicating that the 
eigenvalues of Sn do not tend to collapse to zero in general. In dimension one, it is 
shown that Sn is bounded away from zero if the logarithmic target density is uni- 
formly continuous. For a modification of the AM algorithm including an additional 
fixed component in the proposal distribution, the eigenvalues of Sn are shown to 
stay away from zero with a practically non-restrictive condition. This result implies 
a strong law of large numbers for super-exponentially decaying target distributions 
with regular contours. 



1. Introduction 

Adaptive Markov chain Monte Carlo (MCMC) methods have attracted increasing 
interest in the last few years, after the original work^f Haario, Saksman, and Tam- 
minen H 



review 



and the subsequent advances in the field [ll, 12, la, ll3|; see also the recent 
Several adaptive MCMC algorithms have been proposed up to date, but 
the seminal Adaptive Metropolis (AM) algorithm is still one of the most applied 
methods, perhaps due to its simplicity and generality. 

The AM algorithm is a symmetric random-walk Metropolis algorithm, with an 
adaptive proposal distribution. The algorithm start^ at some point Ai = xi G M"^ 
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with an initial positive definite covariance matrix Si = si E M and follows the 
recursion 

1 /2 

(51) Let Yn+i = Xn + OSn Wn+1-, where Wn+i is an independent standard Gaussian 
random vector and 6' > is a constant. 

(52) Accept Yn+i with probability min{l, ^^^p^} and let X^+i = Yn+i; otherwise 
reject Yn+i and let Xn+i = Xn- 

(53) Set 5^+1 = r(Xi,...,X„+i). 



In the original work [9] the covariance parameter is computed by 

n+l 

n 



n+l 

(1) r(-^l5 • • • ,Xn+l) = — ^^(Xjfc — Xn+l){Xk — Xn+l)'^ + el, 



k=l 



where X„ := Ylt=i -^k stands for the mean. That is, Sn+i is a covariance estimate 
of the history of the 'Metropolis chain' Xi, . . . , Xn+i plus a small e > multiple of the 
identity matrix I G M.'^^'^. The authors prove a strong law of large numbers (SLLN) for 
the algorithm, that is, J2^=i fi^k) f^d f{x)7!'{x)dx almost surely as n — > oo for 
any bounded functional / when the target distribution vr is bounded and compactly 
supported. Recently, SLLN was shown to hold also for tt with unbounded support, 
having super-exponentially decaying tails with regular contours and / growing at 



most exponentially in the tails [17 



This article considers the original AM algorithm (9I])^(93]), without the lower 
bound induced by the factor el. The proposal covariance function F, defined precisely 
in Section [2], is a consistent covariance estimator first proposed in [2|. A special case 
of this estimator behaves asymptotically like the sample covariance in ([1]). Previous 
results indicate that if this algorithm is modified by truncating the eigenvalues of 
Sn within explicit lower and upper bounds, the algorithm can be verified in a fairly 
general setting j3, 13|. It is also possible to determine an increasing sequence of 



truncation sets for Sn, and modify the algorithm to include a re-projection scheme 
in order to verify the validity of the algorithm py]. 

While technically convenient, such pre-defined bounds on the adapted covariance 
matrix Sn are inconvenient in practice. Ill-defined values can affect the efficiency 
of the adaptive scheme dramatically, rendering the algorithm useless in the worst 
case. In particular, if the factor e > in the AM algorithm is selected too large, the 
smallest eigenvalue of the true covariance matrix of tt may be well smaller than e > 0, 
and the chain Xn is likely to mix poorly. Even though the re-projection scheme of 
[H avoids such behaviour by increasing truncation sets, which eventually contain the 
desirable values of the adaptation parameter, the practical efficiency of the algorithm 
is still strongly affected by the choice of these sets 

After defining precisely the algorithms in Section [21 the above mentioned uncon- 
strained AM algorithm is analysed in Section [31 First, it is studied how the AM 
algorithm run on an improper uniform target vr = c > behaves. It is also shown 
that in a one-dimensional setting and with a uniformly continuous log vr, the variance 
parameter Sn is bounded away from zero. This fact is shown to imply, with the 
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results in p/7[, a SLLN in the particular case of a Laplace target distribution. While 
this result has little practical value in its own right, it is the first case where the 
unconstrained AM algorithm is shown to preserve the correct ergodic properties. It 
shows that the algorithm possesses self-stabilising properties and further strengthens 
the belief that the algorithm would be stable and ergodic under a more general set- 
ting. The results of Section [3] also give some insight to the behaviour of the adaptive 
chain that can be helpful when the algorithm is applied in practice. 

Section m considers a slightly different variant of the AM algorithm, due to Roberts 
and Rosenthal [3], replacing (91]) with 

(SI') With probability /3, let = Xn+Vn+i where Vn+i is an independent sample 
of ggx; otherwise, let = X„ + 9Sn Wn+i as in (3ID- 

While omitting the parameter e > 0, the proposal strategy (311) includes two addi- 
tional parameters: the mixing probability /3 G (0, 1) and the fixed symmetric proposal 
distribution ggx. It has the advantage that the 'worst case scenario' having ill-defined 
gfix only 'wastes' the fixed proportion /3 of samples, while Sn can take any positive 
definite value on adaptation. This approach is analysed also in the recent preprint 
[3], relying on a technical assumption that ultimately implies that Xn is bounded 
in probability. In particular, the authors show that if ggx is a uniform density on a 
ball having a large enough radius, then the algorithm is ergodic. Section H] uses a 
perhaps more transparent argument to show that the proposal strategy (311) with a 
mild additional condition implies a sequence Sn with eigenvalues bounded away from 
zero. This fact implies a SLLN using the technique of as shown in the end of 
Section HI 



2. The General Algorithm 

Let us define a Markov chain (X„, M„, S'„)„>i evolving in space M'^ x M*^ x with 
the state space and C W^^'^ standing for the positive definite matrices. The 
chain starts at an initial position Xi = xi e W^, with an initial meaiJl Mi = mi G 
and an initial covariance matrix Si = si G C^. For n > 1, the chain is defined through 
the recursion 

(2) Xn+l ~ Pqg^{Xn, " ) 

(3) Mn+l ■■= (1 - rin+l)Mn + 77„+iX„+i 

(4) Sn+1 := (1 - Vn+l)Sn + Vn+l{Xn+l - M„)(X„+i - M„)^. 

Denoting the natural filtration of the chain as JF„ := a{Xk, Mk, Sk '■ ^ < k < n), 
the notation in ([2]) reads that P(X„+i G A \ J^n) = Pqs„i-^n, A) for any measurable 
A C M*^. The Metropolis transition kernel Pq is defined for any symmetric probability 



'A customary choice is to set nii = xi. 
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density q{x,y) = q{x — y) through 



where 1a stands for the characteristic function of the set A. The proposal densities 
{qs}s&c<^ are defined as a mixture 

(5) q,{z) := (1 - f3)q,{z) + f3q&,{z) 

where the mixing constant (3 G [0, 1) determines the portion how often a 

fixed proposal density ggx is used instead of the adaptive proposal qs{z) := 
det(^s)-^/2^(^-i/2s-i/2^) 

with q being a 'template' probability density. Finally, the 
adaptation weights (?7n)n>2 C (0, 1) appearing in ([3]) and (jl]) is assumed to decay to 
zero. 

One can verify that for /5 = this setting corresponds to the algorithm (3ID^(33]) 
of Section [1] with Wn+i having distribution q, and for /3 e (0, 1), (311) applies instead 
of (31]) • Notice also that the original AM algorithm essentially fits this setting, with 
rjn := n~^, /? := and if is defined slightly differently, being a Gaussian density 
with mean zero and covariance s + el. Moreover, if one sets (3 = 1, the above setting 
reduces to a non-adaptive symmetric random walk Metropolis algorithm with the 
increment proposal distribution ggx. 

3. The Unconstrained AM Algorithm 

3.1. Overview of the Results. This section deals with the unconstrained AM 
algorithm, that is, the algorithm described in Section [2] with the mixing constant 
f3 = in Sections 13.21 and 13.31 consider the case of an improper uniform target 
distribution tt = c for some constant c > 0. This implies that (almost) every proposed 
sample is accepted and the recursion ([2]) reduces to 

(6) = Xn + es'J^Wn+i 

where {Wn)n>2 are independent reahsations of the distribution q. 

Throughout this subsection, let us assume that the template proposal distribution 
q is spherically symmetric and the weight sequence is defined as := cn~'^ for some 
constants c G (0,1] and 7 G (1/2,1]. The first result characterises the expected 
behaviour of Sn when (X„)„>2 follows 

Theorem 1. Suppose (X„)„>2 follows the 'adaptive random walk' recursion ([6]), with 
KWuVVj^ = I. Then, for all X > 1 there is no > m such that for all n > no and 
k > 1, the following bounds hold 
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Figure 1. An example of the exact development of E [Sn], when si = 1 
and 9 = 0.01. The sequence (E [5'n])n>i decreases until n is over 27, 000 
and exceeds the initial value only with n over 750, 000. 

Proof. Theorem [1] is a special case of Theorem [12] in Section 13. 2[ □ 

Remark 2. Theorem [1] implies that with the choice rjn '■= cn~"' for some c G (0, 1) 
and 7 G (1/2, 1], the expectation grows with the speed 

r^ " 

Remark 3. In the original setting [9| the weights are defined as := and Theo- 
rem [T]implies that the asymptotic growth rate of E[S'„] is e^^^ when (X„)„>2 follows 
Suppose the value of Sn is very small compared to the scale of a smooth target 
distribution vr. Then, it is expected that most of the proposal are accepted, X„ be- 
haves almost as ([HD, and Sn is expected to grow approximately at the rate e^^^ until 
it reaches the correct magnitude. On the other hand, simple deterministic bound 
implies that Sn can decay slowly, only with the polynomial speed n~^. Therefore, it 
may be safer to choose the initial si small. 

Remark 4. The selection of the scaling parameter 6* > in the AM algorithm does 
not seem to affect the expected asymptotic behaviour Sn dramatically. However, the 
choice < ^ -C 1 can result in an significant initial 'dip' of the adapted covariance 
values, as exemplified in Figure [H Therefore, the values ^ -C 1 are to be used with 
care. In this case, the significance of a successful burn-in is also emphasised. 

It may seem that Theorem [1] would automatically also ensure that Sn ^ oo also 
path-wise. This is not, however, the case. For example, consider the probability 
space [0,1] with the Borel cx-algebra and the Lebesgue measure. Then (M„,jF„)„>i 
defined as M„ := 2^"l[o,2-") and JF„ := o"(Xfc : 1 < < n) is, in fact, a submartingale. 
Moreover, EM„ = 2" — oo, but M„ almost surely. 

The AM process, however, does produce an unbounded sequence Sn- 

Theorem 5. Assume that (X„)„>2 follows the 'adaptive random walk' recursion ([6]). 
Then, for any unit vector u G W^, the process u^SnU oo almost surely. 




E [Sn] ^ exp 
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Proof. Theorem [5] is a special case of Theorem [20] in Section 13.31 



□ 



In a one-dimensional setting, and when logvr is uniformly continuous, the AM 
process can be approximated with the 'adaptive random walk' above, whenever S'„ 
is small enough. This yields 

Theorem 6. Assume d = 1 and logvr is uniformly continuous. Then, there is a 
constant b > such that liminf„_^oo Sn > b. 

Proof. Theorem [6] is a special case of Theorem [20] in Section 13.41 □ 
Finally, having Theorem [6] it is possible to establish 

Theorem 7. Assume q is Gaussian, the one- dimensional target distribution is stan- 
dard Laplace tc{x) := ^e~'^' and the functional f : M ^ M satisfies snp^e~'^^^^\f{x)\ < 
oo for some 7 G (0,1/2). Then, n~^J2k^if{-^k) J f{x)n{x)dx almost surely as 
n ^ 00. 

Proof. Theorem [7] is a special case of Theorem [23] in Section 13.41 □ 

Remark 8. In the case ?7„ := n~^, Theorem [7] implies that the parameters M„ and Sn 
of the adaptive chain converge to and 2, that is, the true mean and variance of the 
target distribution vr, respectively. 

Remark 9. Theorem [6] (and Theorem [7]) could probably be extended to cover also 
targets vr with compact supports. Such an extension would, however, require specific 
handling of the boundary effects, which can lead to technicalities. 

3.2. Uniform Target: Expected Growth Rate. Define the following matrix 
quantities 



for n > 1, with the convention that ai = G M'^^'^. One may write using ([3|) and ([6|) 



x„+i - M„ = x„ - M„ + esl/'Wn+i = (1 - Vn){Xn - M„_i) + esl/^Wn+,. 



since Wn+i is independent of J-'n and zero-mean due to the symmetry of q. The values 
of (a„)„>2 and (6n)n>2 are therefore determined by the joint recursion 



(7) 
(8) 



a„ := E [(X„ - M„_i)(X„ - M„_i)^] 
bn :=E[5„] 



If EWnW^ 



J, one may easily compute 



E[(X„+i - M„)(X„+i - Mnf] 

= (1 - Vnf E [(X„ - M„_i)(X„ - Mn-if] + e'E [Sn] 



(9) 
(10) 



an+l = (1 - VnYan + 

bn+1 = (1 - rjn+l)bn + r/n+lfln+l- 
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Observe that for any constant unit vector u G Mf^, the recursions and (fTOl) hold 
also for 

agi := E - M„)(X„+i - M^fu] 

6^+1 := E [u^Sn+iu] . 

The rest of this section therefore dedicates to the analysis if the one-dimensional 
recursions ([9]) and (fTOl) . that is, a„, 6„ G M+ for all n > 1. The first result shows that 
the tail of (6n)n>i is increasing. 

Lemma 10. Let uq > 1 and suppose > 0, 6^0 > ^'^^ foi" n> uq the sequences 
an and bn follow the recursions ([9]) and ( fTOj) . respectively. Then, there is a mo > uq 
such that {bn)n>mo is strictly increasing. 

Proof. If 6* > 1, we may estimate a^+i > (1 — rjn)'^an + bn implying bn+i > bn + 
rin+i{l — TjnYan for all n > hq. Since 6„ > by construction, and therefore also 
On+i > O'^bn > 0, we have that > bn for all n > uq + 1. 
Suppose then 9 < 1. Solving a„+i from ( fTOl) yields 

(^n+l = Vn+1 {bn+1 — bn) + &n 

Substituting this into Qj, we obtain for n > uq + 1 

r]~l^ {bn+l - bn) + 6„ = (1 - T]n)^ [t]~'^ (6„ - bn-l) + &n-l] + O^n 

After some algebraic manipulation, this is equivalent to 

(11) bn+l -bn = ^(1 - Vnf{hn - 6„_l) + r/„+i [(1 - Tjn)^ - 1 + 6„. 

Now, since ?7„ ^ 0, we have that (1 — rjn)"^ — 1 + > whenever n is greater than 
some rii. So, if we have for some n' > rii that 6„/ — &n'-i > 0, the sequence (6n)n>n' 
is strictly increasing after n' . 

Suppose conversely that bn+i — bn < for all n > Ui. From (ITUl) . — bn = 
?7n+i(a„+i — 6„) and hence 6„ > a„+i for n > rii. Consequently, from ([H]), a„+i > 
(1 — rjn)'^an + which is equivalent to 

Since r^n — > 0, there is a /i > 1 and n2 such that a„+i > /ia„ for all n > n2. That 
is, (a„)n>n2 grows at least geometrically, implying that after some time a„+i > 
which is a contradiction. To conclude, there is an mo > uq such that (6„)„>mo is 
strictly increasing. □ 

Lemma fTOl shows that the expectation E [m-^S'^m] is ultimately bounded from below, 
assuming only that rjn — ^ 0. By additional assumptions on the sequence the 
growth rate can be characterised in terms of the adaptation weight sequence. 

Assumption 11. Suppose (?7„)n>i C (0, 1) and there is m' > 2 such that 

(i) {Vn)n>m' is decreasing with r]^ 0, 

(ii) {i]n+i - r]n^^'^)n>m' is decreasing and 



8 MATTI VIHOLA 

(iii) J2n=2Vn = OO. 

The canonical example of a sequence satisfying Assumption [TT] is the one assumed 
in Section [STTl := cn~'^ for c G (0, 1) and 7 G (1/2, 1]. 

Theorem 12. Suppose > anc? 6^ > /or some m > 1, and for n > m the 

a„ and hn are given recursively by ([9]) and (fTOj) . respectively. Suppose also that the 
sequence {rin)n>2 satisfies Assumption [771 wi/i some m' > m. Then, for all X > 1 
there is m2 > m' such that for all n > m2 and k > 1, the following bounds hold 

n+k 




A 

Proof. Let niQ be the index from Lemma [TOl after which the sequence 6„ is increasing. 
Let nil > max{mo,m'} and define the sequence {zn)n>mi-i by setting Zm^-i = &mi-i 
and Zmi = femi , and for n > mi through the recursion 

(12) Zn+l = Zn + ^^(1 - r]n)^{Zn - ^n-l) + Vn+lO^ Zn 

Vn 

where 6^ > is a constant. Consider such a sequence {zn)n>mi-i and define another 
sequence {gn)n>mi+i through 

5'n+l •— Vn+1 — Vn+1 U Vn) Z r Vn+l^^ 

Zn L Vn Zfi—l 

1/2 I {I -V'' 
'ln+1 



'Z" Qn + Vn 




Lemma [TH below shows that gn ^ 0. 

Let us consider next two sequences (-2i^'')n>mi-i and {zr?)n>mi-i defined as 
{zn)n>mi-i abovc but usiug two different values 9^^^ and 9^'^\ respectively. It is clear 
from (1111) that for the choice (9*^^'' := 6 one has 6„ < Zn^ for all n > mi — 1. Moreover, 



since &mi+i/^mi < -^mi+i/^^ml, it holds by induction that 
^<l + ^(l-^„)^ 

On Vn 

n . ( z^^^ \ z^^^ 

<1 + -— (l-r^n) \ \-^\^Vn+lQ =^ 
Vn \ Z'n' J Zii' 

also for all n > mi + 1. By a similar argument one shows that if O^"^^ := [(1 — Vm^Y ~ 
1 + ^2]i/2 then hn > z^n and 6„+i/6„ > ^i+i/^i^^ for all n > mi - 1. 

Let A' > 1. Since gn^ — > 6'*^"^-' and g^n^ — > 6'*^^^ there is a m2 > mi such that the 
following bounds apply 




l + ^v^<^ and ^<l + A'^«V^ 
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loJ'^)<logM<Y: log (l + A'SWVW) < A'9 Y. 



for all n > m2. Consequently, for all n > m2, we have that 

(1) \ n+k n+k 

j=n+l j=n+l 

Similarly, by the mean value theorem 

, fbn+k\ ^ / 0^'^ \ 0^'^ ^ 

since rjn is decreasing. By letting the constant mi above be sufficiently large, the 
difference \e^^^-9\ can be made arbitrarily small, and by increasing m2, the constant 
A' > 1 can be chosen arbitrarily close to one. □ 

Before Lemma [HI let us establish some properties of the weight sequence (?7ri)n>i 
satisfying Assumption [TTl 

Lemma 13. Suppose {rin)n>i satisfies AssumptionUli Then, 
(a) {r]n+i/Vn)n>m' is increasing with r]n+i/r]n 1 and 

Proof. Define := rjn for all n > m'. By Assumption [11] ([i]) (a„)„>m' is increasing 
and by Assumption [TT] {Aan)n>m'+i is decreasing, where Attn '■= an — «n-i- One 
can write 



a„+l 1 + - 1 + Aa, 



implying that (?7n+i/''?n)n>m' is increasing. Denote c = \imn^ooVn+i/Vn < 1- It holds 
that rjm'+k < crjm'+k-i < ■■■ < c^Vm'- If c < 1, then "^^r/n < oo contradicting 
Assumption [TT] dm]), so c must be one, estabhshing ([a]). 
From (taj), one obtains 



-1/2 _ -1/2 ^ ^ 1/2 

Vn+1 Vn _ ( Vn 



-1/2 \ n , 







implying (jb]). □ 

Lemma 14. Suppose nii > 1, > 0, the sequence {r]n)n>mi satisfies Assumption 
[77] and 6 > is a constant. The sequence {gn)n>mi defined through 

^n+1 - r?n+l —ZYJ^ + 

\ 9n + Vn 

satisfies lim„_,oo5'n = 0- 

Proof. Define the functions /„ : IR+ for n > mi + 1 by 
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The functions /„ are contractions on [0, oo) with contraction coefficient g„ := (1— r^^)^ 
since for all x, y > 



If f \ n / M 1/2 (1 — Vn)^ 

\fn+l{x) - fn+l{y) \ = Vn+r 



Vn+1 



Vn 
1/2 



X 



I -1/2 I -1/2 

X + r]n y + Vn 



< 



1/2 



x-y 



Vn 



[1 - r]n) \x-y\< \x - y\ 



where the second inequality holds since r/^+i < rjn. 
The fixed point of fn+i can be written as 



X 



n+l •" 



where 



C.- „-l/2 _ 1/2 -IQ _ „ ^a _ n'l 
Sn+l •— '/„ 'In+l'ln \^ 'In) 'ln+1^ 



1/2 n2 



Lemma [T^ ([a]) implies /i„+i — ^ 4^^^. Moreover, 



^n+l = Vn ^''^ - Vn+lVn ^ + ^n+l(3 - 3r/„ + - 



^n+1 



1/2 



Vn^l - Vn '''' ) + ^n+l(3 - Sr]^ + - . 



-1/2 



1/2 



Therefore, by Assumption [TT] fpT) and Lemma [T3| .^„-i-i ^ and consequently the fixed 
points satisfy x* — > 6'. 

Consider next the consecutive differences of the fixed points. Using the mean value 
theorem and the triangle inequality, write 



2 \x^_^i xA < l^n+l Cn\ + 



l^n+1 ^n ~^ f^n+1 l^nl 



^ l'Cn+1 — C,n\ H ;= l'Cn+1 ~ ^n \ + 



l/^n+1 ~ fJ-n] 



^ Ci l^n+l — C,n\ + Ci l/^n+l — fir, 



where the value of r„ is between ^^^^ + fin+i and + /i„ converging to 4^^^ > 0, the 
value of r/j is between |.^n+i| and converging to zero, and ci > is a constant. 
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The differences of tlie latter terms satisfy for all m > m' 
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n=m' n=m' 



< 



1/2 / \ 1/2 

Vn+1 \ ( Vn ^ 



V: 



1/2 



Vm'-l 



by Assumption [TT] (juj) and Lemma US] Q • For the first term, let us estimate 



'In / ^ ^ \'/n— 1/ ^ 



1/2 _ 1/2 
'7n ^771+1 



+ 



Vn+li^Vn - Vn) " Vn^{^Vn-l " ^/^-l) 



Assumption [TT] ([i]) implies that r^^^ — tjI/^i > for n > m' and hence 



E 



m 

n=m' 



1/2 _ 1/2 



— Vm^ for si'iiy ^ Since the function (x, ?/) t— x(3?/ — y^) 
is Lipschitz on [0, 1]^, there is a constant C2 independent of n such that \'r]n+i{'^iln — 
vl) - Vn'^i^Vn^i - vl~i)\ < C2{\r]n+i - "i]n'^\ + |?7n " ^?n-i|), and a similar argument 



shows that 



< C3 < CX). 



One can also estimate 



^n+\ 



< C4 



1/2 



Vn 



Vn+1 
1/2 



1/2 _ „-l/2 
In 



Vn~l 



1/2 



Vn^l 



1/2 



-1/2 _ "1/2 
Vn Vn-l 



„-V2_ -1/2 
'/n+1 '/n 



-1/2 -1/2 
Tin ' -VnA 



yielding by Assumption [TT] ^ and Lemma [13] that X]r=m,' I'Cn+i ~ 'Cnl ^ C5 for all 
m > m', with a constant C5 < 00. Combining the above estimates, the fixed point 
differences satisfy 
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Fix a 5 > and let rig > mi be sufficiently large so that Y2k=ns+i l^n+i ~ I < 6 
implying also that |x* — 0| < 5 for all n > ns- Then, for n> ns one may write 



gn-d 



< \gn - <| + 



xi-e 



< \ fn{gn-l)-fnK)\+S 



< 


Qn Ifi'ri— 1 X^ 


+ 5 < 






+ 


* * 1 


< 


QnQn-1 \gn-2 ' 


* 1 
Xn-2 


+ 


* * 


+ 


* * 
^n— 1 



<•■■<( n -<j+2'5- 

\k=ng+l 



Since logllLn.+i^fc = 3E 



log(l - rik-i) < -3 J2l=L Vk ^ -oo 



as n 



oo by 



Assumption [TT] dm]) , it holds that {Y[ 



k=ns+l 



qk)\gns-x. 



ns I 



0. That is, \gn — 6\ < 36 



for any sufficiently large n, and since S > was arbitrary, gn 9. 



□ 



3.3. Uniform Target: Path-wise Behaviour. Section 13.21 characterised the be- 
haviour of the sequence E [Sn] when the chain (X„)„>2 follows the 'adaptive random 
walk' recursion ([6]). In this section, we shall verify that almost every sample path 
(5'„)„>i of the same process are increasing. 
Fix a unit vector u eM.'^ and define the scalar process {Zn)n>2 through 



(13) 



U 



where ||x|| := V x^x stands for the Euclidean norm. The behaviour of the process 
{Zn)n>2 determines the behaviour of {u^ SnU)n>2 since one can write a recursion for 

{u^SnU)n>2 USlug Ouly (Z„)„>2 



U^'Sn+lU = (1 - r]n+l)u^SnU + ?7„+im'^'(X„+i - M„) (X„,+i - MnY'u 



(14) 



[1 +^n+l(^^+l - l)]u^SnU. 

On the other hand, one can express {Zn)n>2 in terms of {Wn)n>2 and (S'„)„>i 

Zn+l 



9u — h (1 - r]n)u 



Tr-l/2 

+ (1 - 

o„ u\\ 



\\Sn'^u\ 
Ta \ 1/2 

U^SnU 



Using f[T^ . this simplifies to 

(15) 

where 



Uri.Zri 



n+1 •- 



n+l 



and 



Un ■■= (1 



l + r/„(Z2-l^ 



1/2 
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Let us observe first that (Wn)n>2 are independent if the distribution q of (Wn)n>2 is 
spherically symmetric. 

Lemma 15. Assume {Wn)n>i ore independent and follow a spherically symmetric 
non- degenerate distribution in M.'^. Then {Wn)n>i o,re independent and identically 
distributed non- degenerate real-valued random variables. 



Proof. Choose a measurable A C M, denote T„ := \\Sn'^u\\~^Sn^u and define An '■ 
{x G M*^ : T'^x e A}. Let Rn be a rotation matrix such that -R^T„ = ei : 
(1, 0, . . . , 0) G M*^. Since Wn+i is independent of we have 

= ¥{eJWn+i eA\Tn)= neJWi G A) 
by the rotational invariance of the distribution of {Wn)n>i- 



'lcl/2„ 



□ 

Notice particularly that if (W^„)„>2 are standard Gaussian vectors in M.'^ then 
{Wn)n>2 are standard Gaussian random variables. 

Only values \Zn\ < 1 can decrease Sn as shown by (|T4l) . But if both rjn and ?7n^n 
small, the variable f/„ is clearly close to unity, and consequently Z„ behaves almost 
as a random walk. Let us consider an auxiliary result quantifying the behaviour of 
this random walk. 

Lemma 16. Let uq > 2, suppose Zno-i is J-'n^-i-measurable random variable and sup- 
pose {Wn)n>no '^'^^ rcspcctivcly {J-'n)n>no-i^^(^surable and non- degenerate i.i.d. random 
variables. Define for Zn for n > 2 through 

Zn+l = Zn 

Then, for any N, 61,62 > 0, there is a ko > 1 such that 



P 



fl ^ 



l<A^} 



>6i 



y'n] <62 



a.s. for all n > 1 and k > ko. 

Proof. From the Kolmogorov-Rogozin inequality. Theorem [34] in Appendix \^ 

F{Zn+j -Zne[x,X + 2N] I J^n) < C^j'^'^ 

for any x G M, where the constant Ci > depends on A^, 9 and on the distribution of 
Wj. In particular, since Zn+j — Zn is independent of Z„, one may set x 



N 



above, and thus P f \Zri+j\ < N 



E 



n+j\ 
k 



^n) < cij The estimate 



1 ^ 



{\Z„+j\<N} 



T 



implies P(A; ^ X]j=i '^{\z„+j\<n} — '^1 | -^n) < 6^ ^C2k concluding the proof. □ 
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The technical estimate in the next Lemma [18] makes use of the above mentioned 
random walk approximation and guarantees ultimately a positive 'drift' for the eigen- 
values of Sn- The result requires that the adaptation sequence {r]n)n>2 is 'smooth' in 
the sense that the quotients converge to zero. 

Assumption 17. The adaptation weight sequence (?7n)n>2 C (0, 1) satisfies 

hm ^ = 1. 

Lemma 18. Let uq > 2, suppose ^no-i ^■^ J-'no-i-nT'^o.surable, and assume {Zn)n>no 
follows (ITSl) with non- degenerate i.i.d. variables {Wn)n>no measurable with respect to 
i^n)n>no) respectively, and the adaptation weights (?7„)n>no satisfy Assumption [77, 
Then, for any C > 1 and e > 0, there are indices k > 1 and rii > riQ such that 
P(L„_fc I ^n) < e o-s. for all n >ni, where 

J2 log [1 + Vn+, [Zl^, - 1)] < kCr^y^ . 

Proof. Fix 7 G (0, 2/3) and assume < ri~'^. One may estimate 

/ y2 \ 1/2 



>{l-Vn 



■'n,k 



l-Vn + VnZl 
,1/2 / 1 V« 



\ ^ 'In / 

where Ci := 2sup„>„|j(l — r/n)"^^^ < oo. Observe also that Un < 1. 

Let ko > 1 he from Lemma [16] applied with = v8C + l + 1, 5i = 1/8 and 
^2 = e, and fix > fco + 1- Let n > hq and define an auxiliary process (^j"'*)j>n.o-i 
as Z^P^ = for riQ — l<j<n + l., and for j > n + 1 through 

zf^ = z^^, + e J2 w^. 

i=n+2 

For any n + 2<j<n + k and a; G := (^i=n+i{^i — Vi'^}^ the difference of zj"-* 
and Zj can be bounded by 

\Z^t - Z,+i| < \Z,\\1 - U,\ + |zf - Z,\ < cir^j-^" + |zf - Z,| < ■ • • 

i 3 3 / \ 1"!')' 3 

< ci ^ ^'^ < Ci?7n ''^ X] ( — ) < C2(j - ra)?7„ 2^ 
II ■ 1 1 V / 



i=n+l i=n+l 



< 



by Assumption [T71 Therefore, for sufficiently large n > no, the inequality \Zj —Zj 
1 holds for all n < j < n + k and uj G An-.n+k- Now, if G An-.n+k, the following 
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bound holds 

log [l+Vj{Z] - 1)] > log [1 +r/,(mm{iV, \Z,\y - 1)] 

^ l{|zW|>iV} log [1 + vAiN - 1)2 - 1)] + log [1 - r],] 

by the mean value theorem, where the constant Pj = Pj{C, rjj) G (0, 1) can be selected 
arbitrarily small whenever j is sufficiently large. Using this estimate, one can write 

for U G An:n+k 

k k 
J2 log [1 + Vn+j [Zl^, - 1)] > (1 - (3n) Yl - (1 + E ^"+^- 

where J+^i^;. := {j G [1, A;] : zi"^ > A^}. Define the sets 



5n,fc := <; Y E l{|Z„+,+i|<iV} < 



Within it clearly holds that #1^+^,^ >k-l-{k- l)6i = 7{k - l)/8. Thereby, 
for all io G Bn,k n A^-.n+k 

k 

J] log [1 + Vn-,j {Zl+j - I)] 



{l-Pn)l( inf ^V-(l+/^n)f 



2 1 I \ • I It/ \ i 

\^<j<k r]n J \l<j<k V'. 



for sufficiently large n > 1, as then the constant /5„ can be chosen small enough, and 
by Assumption [T71 In other words, if n > 1 is sufficiently large, then Bn,k H An-.n+k H 
Ln,k = 0- 

Let us then write the conditional expectation of interest in parts, 

P iLn,k I -^n) = P {Ln^ki An;n+k \ -^n) + P iLn,ki A^ | J- n) 

(16) n+k 

+ ^ ^ ^ (Ln^ky An-i-l, A'- I J- n) 
i=n+l 

where A[ := {Zf > 'r]~^}- Let G A'^ for any n < i < n + k and compute 
log [1 + riiiZf - 1)] > log [1 + ruivP - 1)] > log |1 + ^V.kC] 

> ^!>*£- > 

- 1 + 2r]ikC - ' 

whenever n > uq is sufficiently large, since — > 0, and by Assumption [T71 That 
is, if n is sufficiently large, all but the first term in the right hand side of f|T6l) are 



16 MATTI VIHOLA 

a.s. zero. It remains to show the inequahty for the first, for which the estimate 

<P(<J Tn)<e 

holds by Lemma [IBl concluding the proof. □ 

Using the estimate of Lemma [18], it is relatively easy to show that the eigenvalues 
of Sn tend to infinity, if the adaptation weights satisfy an additional assumption. 

Assumption 19. The adaptation weight sequence (?7„)„>2 C (0, 1) is in P but not 
in that is, 

oo oo 

rin = OO and ^^'7n < c>o. 

n=2 n=2 

Theorem 20. Assume that {Xn)n>2 follows the 'adaptive random walk' recursion ([6]) 
and the adaptation weights {r]n)n>2 satisfy Assumptions [7^ and [73 Then, for any 
unit vector u G M'^, the process SnU — >■ oo almost surely. 

Proof. The proof is based on the estimate of Lemma [18] applied with a similar mar- 



tingale argument as in [18 



Let > 2 be from Lemma [T8] applied with C = 4 and e = 1/2. Denote ^i := ki + 1 
for i > and, inspired by ([SD, define the random variables (Tj)j>i by 

r 4 

:=minhMr/,^_,, ^ log [l + r/, (Z| - l)] 

with the convention that ?7o = 1. Form a martingale (Fj, Gi)i>i with = and 
having differences dFj := Tj — E [Tj | Qi-i] and where = {0, Vt] and := J^t^ for 



i > 1. By Assumption! 

oo oo 



1=2 1=1 



with a constant c = c{k, C) > 0, so Yi is a L^-martingale and converges a.s. to a finite 
limit Moo [e.g. llO], Theorem 2.15]. 

By Lemma [T8l the conditional expectation satisfies 

4+1 

E [T,+i I g;^] > A;C%(1 -e)+ J2 log(l " ^i)^ ^ 

j=4+l 

when i is large enough, and where the second inequality is due to Assumption [T7] This 
implies, with Assumption [T9l that I Gi-i] = oo a.s., and since Yi converges 

a.s. to a finite limit, it holds that Yli'^i ~ ^ ^-S- 



CAN THE ADAPTIVE METROPOLIS COLLAPSE 17 

By ( IT4l) . one may estimate for any n = im with m > 1 that 

m 

log{u^ Snu) > log{u^ Siu) + Tj ^ oo 

i=l 

as m — oo. Simple deterministic estimates conclude the proof for the intermediate 
values of n. □ 

3.4. Stability with One-Dimensional Uniformly Continuous Log-Density. 

In this section, the above analysis of the 'adaptive random walk' is extended to 
imply that lim inf „^oo 5'„ > for the one- dimensional AM algorithm, assuming log vr 
uniformly continuous. The result follows similarly as in Theorem [201 by coupling 
the AM process with the 'adaptive random walk' whenever S'„ is smaller than some 
constant fi > 0. 

Theorem 21. Assume d = 1 and logvr is uniformly continuous, and that the adapta- 
tion weights {i]n)n>2 satisfy Assumptions [T7| andUM Then, there is a constant b > 
such that liminf„_^oo Sn > b. 

Proof. Fix a (5 G (0, 1). Due to the uniform continuity of logTT, there is a 5 > such 
that 

logvr(y) - log7r(x) > - log ( 1 



2 V 2^ 

for all \x — y\ < 6i. Choose M > sufficiently large so that f^^_,^^j^'jyq{z)dz > 
V^l - 6/2. Denote by 

Qg{x,A) := / q{y - x)dy 



the random walk transition kernel with increment distribution q, and observe that 
the 'adaptive random walk' recursion can be written as ~ Q^g^(X„, ■)." 

For any x G M'^ and measurable A C M'^ 



\Q^^{x,A)-P^^{x,A)\<2 

< 2 



1- J min |l, ^1^1 - 3;)d?/ 



1— / mm<l, — > qizjdz 

n{x) 



'{\z\<M} 

Now, \Qq^{x,A) — Pq^{x,A)\\ < 5 whenever y/Osz < 6i for all \z\ < M. In other 
words, there exists a /i = fi{6) > such that whenever s < fi, the total variation 
norm \\Qq,{x, ■) - Pq^x, ■)\\ < S. 

Let n,k > 1 and define the random variables {Xj"\ Mj'^\ Sj'^^)j(z[n,n+k] by setting 
(M"\ Mi"', si"') 5 (X„, Af„, S„) and 

M^l\ ■■= (1 - + V,+iX\l\ and 
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for j + 1 G [n + l,n + k]. The variable can be selected so that P(X^"\ = 

Xn+i I ^n) = 1 - \\Pqg^{Xn, ■) - Qq_^^^{xi'^\ ■)\\; scc Theorem |35] in Appendix El 

Consequently, P(X^"\ ^ Xn+i, S'„ < /i | < 6. By the same argument, can 
be chosen so that 

since if X^+i = -^n+i, then also S^^^i = Sn+i- This implies 

+ X„+2} U {Xl'Ji ^ n 5„;„+2 I ^n) < 25 

where := n]^^{5'j < /i} for j > n. The same argument can be repeated to 
construct so that 

(17) P(D„:,+fc I ^0 > 1-M 

Where := fljil^f ^ = U i?L+.- 

Apply Lemma [T5] with C = 18 and e = 1/6 to obtain A; > 1, and fix 5 = tjk. 
Denote £j := i/c + 1 for any i > 0, and define the random variables {Ti)i>\ by 

(18) T,:=l|5,,_,<^/2}min|A;M%_„ log [l + r^, (Z| - l)] 1 

where are defined as (IT^ . 

Define also T, similarly as Tj, but having with j G + l,^j] in the right 

hand side of f|T8|) . defined as zf^~^^ = Zi^_^ and by 



for j G + 1, fj]. Notice that Tj coincides with Tj in Bi^_^.i^ r\Di^_-^.i^. Observe also 

that Xj^'^^^ follows the 'adaptive random walk' equation ([6]) for j G [ii-i + 1, ii], and 

hence z'f''^^ follows (fT5l) . Consequently, denoting := F^^, Lemma [T8] guarantees 
that 

(19) P(^^.-i,fc I 

where L^,_i,fc := {fi < kMr]i^_^}. 

Let us show next that whenever Se^_-^ is small, the variable Tj is expected to have 
a positive value proportional to the adaptation weight, 

(20) E [T, I l{5,^_^<^/2} > A;%_,l{5,,^_^<;./2} 



CAN THE ADAPTIVE METROPOLIS COLLAPSE 

almost surely for any sufficiently large i > 1. Write first 
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E \Ti I g, 



E 



1 



> E 



{5,^_^<m/2} 

l{5,^_^</./2} 



where the lower bound of T, is given as 



^. := \og{l-r]j). 

i=4-i+i 

By Assumption [TTl C,i > — 2A;?74_^ — for any sufficiently large i. Therefore, 

whenever P (^bI_^.j^ > e = 3/C, it holds that 

IE[Ti I l{5,^_^<;,/2} > A:%_i1{s,^_^<m/2} 

for any sufficiently large i. On the other hand, if P 
defining 

one has by (fT7|) and (fT9|) that P(-Ej) < 3e, and consequently 



^i-i ) < e, then by 



j + E 

> Se^i + (1 - 3e)A;C%_, > 

This establishes (|2QD. 

Define the stopping times ri = 1 and for n > 2 through r„ := inf{i > r„_i : 
'S'^^i > fi/2, Sg^ < /i/2} with the convention that inf = oo. That is, Ti record 
the times when S'^ enters (0,yu/2]. Using r^, define the latest such time up to n by 
cr„ := sup{rj : i > 1, Tj < n}. As in Theorem I2U| define the almost surely converging 
martingale (Fj, Gi)i>i with Fi = and having the differences dYi := (Tj— E [Tj | Gi-i]) 
for i > 2. 

It is sufficient to show that liminfj^oo Sg^ > b := fi/A > almost surely. If there is 
a finite io > 1 such that Sg. > /i/2 for all i > io, the claim is trivial. Let us consider 
for the rest of the proof the case that {Sg. < /u/2} happens for infinitely many indices 
i > 1. 

For any m > 2 such that Sg^ < fj,/2, one can write 



log5,„ >log5,,„+ J2 



(21) 



« = 0"m + l 



>iog^,,^ + (r„-i^„)+ Yl 



=0'm + l 



since then Sg. < fi/2 for all i G [(7m,m — 1] and hence also E [Tj | Qi-i] > kr]g._-^. 
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Suppose for a moment that there is a positive probabihty that Si^ stays within 
(0,/i/2) indefinitely, starting from some index uii > 1. Then, there is an infinite 
and consequently o"m < a < cxd for all m > 1. But as Ym converges, \Ym — 1^0-^1 is 
a.s. finite, and since rjg^ = oo by Assumptions [H] and [IHl the inequality (I2T1) 
implies that Se^ > /^/2 for sufficiently large m, which is a contradiction. That is, the 
stopping times Xj for all i > 1 must be a.s. finite, whenever Sg^ < /^/2 for infinitely 
many indices m > 1. 

For the rest of the proof, suppose S*^,^ < fi/2 for infinitely many indices m > 1. 
Observe that since Ym — > Y^o, there exists an a.s. finite index 777-2 SO that Ym — ^oo ^ 
— l/21og2 for all m > m2. As 7]^ — and am — oo, there is an a.s. finite rris such 
that ^o-™._i > —1/2 log 2 for all m > 7773. For all m > max{7r72, "^3} and whenever 
Si^ < /x/2, it thereby holds that 

log Se„^ > log Se^^ - (F„ -Y^J> log + - ^ log 2 

> log| -log2 = log 6. 

The case Sg^ > /^/2 trivially satisfies the above estimate, concluding the proof. □ 

As a consequence of Theorem [21], one can establish a strong law of large num- 
bers for the unconstrained AM algorithm running with a Laplace target distribution. 
Essentially, the only ingredient that needs to be checked is that the simultaneous 
geometric ergodicity condition holds. This is verified in the next lemma, whose proof 
is given in Appendix O 

Lemma 22. Suppose that the template proposal distribution q is everywhere positive 
and non-increasing away from the origin: q{z) > q{w) for all \z\ < \w\. Suppose also 

that 7i{x) := ^ exp ^_l£:iHj yjj^f]^ a mean 7/1 G M and a scale b > 0. Then, for all 

L > 0, there are positive constants M, b such that the following drift and minorisation 
condition are satisfied for all s > L and measurable A cM. 

(22) PsVix) < XsV{x) + blc{x), Vx e M 

(23) P,{x, A) > 6siy{A), Vx G C 

where V : R [1,00) is defined as V{x) := (sup^ 7r(z))"'^/^7r~"^/^(x), the set C : = 
[777 — M,m + M], the probability measure fi is concentrated on C and PsV{x) := 
J V{y)Ps{x, dy). Moreover, Xg, 6s G (0, 1) satisfy for all s > L 

(24) max{(l - A,)-\ 5;^} < cs^ 
for some constants c, 7 > that may depend on L. 



Theorem 23. Assume the adaptation weights {rin)n>2 satisfy Assumptions 12 and 
[13 and the template proposal density q and the target distribution it satisfy the as- 
sumptions in Lemma WB, If the functional f satisfies sup^gjj 7r~'''(x)|/(x)| < 00 for 
some 7 G (0, 1/2). Then, X]fc=i fi^k) — ^ / f{x)7!'{x)dx almost surely as n 00. 
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Proof. The conditions of EH are clearly satisfied implying that for any e > there is 
a K = fi;(e) > such that the event 

B^, := <! inf S„, > k 



has a probability P(-Bk) > 1 — e. 

The inequalities (l22l) and ( l23l) of Lemma [22] with the bound (IMll imply, using 
[itI . Proposition 10 and Lemma 15], that for any /5 > there is a constant A = 
e, /9) < oo such that P(5« n {max{|5'n|, |Mn|} > Ara^}) < e. Let us define the 
sequence of truncation sets 

Kn '■= {{m,s) G M X : Amin(s) > K, max{|s|, |m|} < An^} 

for > 1. Construct an auxiliary truncated process M„, S'„)„>i, starting from 
(Xi, Ml, Si) = (Xi, Ml, ^i) and for n > 2 through 

(M„+l, Sn+i) = a„+i (M„, S'n), ?7„+i(X„+i - Mn, (X„+l - Mn)^ - Sn) 

where the truncation function cr„+i : (Kn) x (M x M) ^ fC„ is defined as 



an+i{z,z') 




a Z + Z' e Kn 

otherwise. 



Observe that this constrained process coincides with the AM process with probability 
P(Vn > 1 : (X„, Mn, Sn) = (X„, M„, Sn)) > 1 - 2e. Moreover, [13, Theorem 2] 
implies that a strong law of large numbers holds for the truncated process {Xn)n>i, 
since sup^ < oo for some a G (0, 1 — P), by selecting /3 > above 

sufficiently small. Since e > was arbitrary, the strong law of large numbers holds 
for (X„)„>i. □ 



4. AM With a Fixed Proposal Component 



This section deals with the modification due to Roberts and Rosenthal 14|, in- 
cluding a fixed component in the proposal distribution. In terms of Section [21 the 
mixing parameter in satisfies < (3 < 1. Theorem [22] shows that the fixed pro- 
posal component guarantees, with a verifiable non-restrictive Assumption [211 that 
the eigenvalues of the adapted covariance parameter Sn are bounded away from zero. 



As in Section [331 this result implies an ergodicity result. Theorem [ 

Let us start by formulating the key assumption that, intuitively speaking, assures 
that the adaptive chain (X„)„>i will have 'uniform mobility' regardless of the adap- 
tation parameter s G C^. 



Assumption 24. There exist a compactly supported probability measure u that is 
absolutely continuous with respect to the Lebesgue measure, constants 6 > and 
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c < oo and a measurable mapping C, : M'^ x — > M'^ such that for all x G M*^ and 
s e C\ 

s) — x|| < c and Pq^x, A) > 5u(^A — ^{x, s)^ 

for all measurable sets A C M"', where A — y := {x — y : x & A} is the translation of 
the set Ahy y e W^. 

Remark 25. In the case of the AM algorithm with a fixed proposal component, one 
is primarily interested in the case where ^(x, s) = ^{x) and for all x e M"^ 

/5gfix(a;-l/)niin|l,--^j > Su{y ~ ^{x)) 

for all y G M'^, where z/ is a uniform density on some ball. Then, since Pg = 
(l-/3)P,-,+/?P,,^, 

PqXx, A) > pPg^^ix, A)>5 [ u{y- Ody 

J A 

and Assumption [Ml is fulfilled by the measure i^iA) := j^v{y)(\y. 

Having Assumption [2ll the lower bound on the eigenvalues of Sn can be obtained 
relatively easily, by a martingale argument similar to the one used in Section [3] and 
in 0. 

Theorem 26. Let M^, 5'n)n>i he an AM process as defined in Section\E satis- 
fying Assumption\2^ Moreover, suppose that the adaptation weights (77„)n>2 satisfy 
AssumptionslT7\ andUU Then, 

liminf inf w'^SnW > 
where S'^ stands for the unit sphere. 

Proof. Let us first introduce independent binary auxiliary variables {Zn)n>2 with 
Zi = 0, and through 

^{Zn+i = 1 I Xn, Mn, Sn, Zn) = S 

P (Z„+i = I X„, M„, Sn, Zn) = (1 - 5). 

Using this auxiliary variable, we can assume X„ to follo^wH 

Xn+l = Zn-\-l{Un+l + ^n) + (1 ^ Zn+l)Rn+l 

where Un+i ~ is independent of JF„ and Zn+i, the random variable S„ := 
^{Xn, Sn) is JF„-measurable, and Rn+i is distributed according to the 'residual' tran- 
sition kernel Ps„{Xn,A) := {l — S)~^[Pqg^{Xn, A) — 6h'{A — En)], Valid by Assumptiou 

121 

'^by possibly augmenting the probability space; see [^. [l^. 
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Define S{w,'-f) := {v & S"^ : \\w — v\\ < 7}, the segment of tlie unit spliere centred 
at w E S'^ and liaving tlie radius 7 > 0. Fix a unit vector w E S'^ and define the 
following random variables 

riTja := inf {\v^{Xn+i - M^T + \v'^{Xr,+2 - M„+i)|2) 

for all n > 1. Denote Gn+i '■= Xn+i — Mn and En+i := — X„+i, and observe 
that whenever Zn+2 = 1, it holds that 

Xn+2 — Mn+l = Un+2 + ~ + -E'n+l = Un+2 + (1 ~ Vn+ljGn+l + -E'n+l 

and we may write 

Zn+2^n+2 — Zn+2 

De5(iD,7) 

where A„ := 1 — ?7„ G (0, 1) for all n > 2. Consequently, we may apply Lemma 
below to find constants 7, /i > such that 



(25) p(z„+2ri22>fi 



Hereafter, assume 7 > is fixed such that (125!) holds, and denote r„+2 •= ri+2 ^"^^ 
S{w) := 5(w,7). 

Consider the random variables 

(26) Dr,+2-= inf {r]n+i\v^{Xn+i-Mn)\' + Vn+2\v^{X„+2-M„+i)\^) 

> min{r7„+i,?7„+2}r„+2 > ?7*?7n+ir„+2 

where ry^, := mfk>2Vk+i/Vk > by Assumption [T71 Define the indices := 2n — 1 
for n > 1 and let 

T„ := r]* min{/x, Z^„r^,J 

for all n > 2. Define the cr-algebras Qn '■= ^e„ for n > 1 and observe that 
E [T„+i I Qn] > V*f^^/'^ by fl2Sl) . Construct a martingale starting from Yi = and 
having the differences dF„+i := r/£„+i(T„_|_i — E [T„+i | The martingale F„ con- 

verges to an a.s. finite limit Y^o as in Theorem [201 

Define also t]* := supf.^2Vk+i/Vk < 00 and k := inffc>2 1 — 77^ > 0, and let 

6:=^>0. 

8?7* 

Denote := infi,g5(^) v'^SnV and define the stopping times ti = 1 and for A; > 2 
through 

T, := inf{n > r,_i : S.^^^ < 6, > &} 

with the convention inf = 00. That is, Tk record the times when S^^"* enters (0,6]. 
Using Tk, define the latest such time up to n by cr„ := supjrfc : k > 1, Tk < n}. 
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Observe that for any n > 2 such that 5^^^^ < b, one may write 

qiw) _ n{w) I f p, ^ q{w) ^ q{w) 

k=an 

71-1 
k = <Jn 

k=a„ ^ 



by (l26l) and since for all k G n — 1] one may estimate S}^1^ < (1 — r/^+i) ^S^^ _^ < 
That is, for any n > 2 such that S^^^ < b 

si:^ > si:! + - + E (e [T.+i I - ^) 

> + - i'.j + ^ i: -x.+i- 

fc=(T„ 

As in the proof of Theorem [HI this is sufficient to find a. e > such that 

liminf^i'^) > e. 

n— >oo 

Finally, take a finite number of unit vectors wi, . . . ,wn G S'^ such that the corre- 
sponding segments S{wi), . . . ,S{wn) cover S'^. Then, 

lim inf inf v'^SnV = lim inf min {S^^'\ . . . , ^i"'^) } > e. □ 

n—>oo vGS'^ n— >oo 

Lemma 27. Suppose Tn C Tn+\ are a-algebras, and Gn+i and En+i are J^n+i- 
measurable random variables, satisfying < M for some constant M < oo. 

Moreover, Un+2 is a random variable independent of J-'n+i, having a distribution v 
fulfilling the conditions in Assumption^2^ 

Let S'^ := {m G : = 1} stand for the unit sphere and denote by S{w,'y) := 
{v E S'^ : \\w — v\\ < 7} the segment of the unit sphere centred atw&S'^ and having 
the radius 7 > 0. There exist constants 7,yU > such that 

P ( inf + |t;^(f/„+2 + Aa.+i + ^n+i)!') > /i J'^i > \. 

for any w E S'^ and any constant A G (0, 1), almost surely. 

Proof. Since p is absolutely continuous with respect to the Lebesgue measure, one 
can show that there exist values 6, 7 > such that 

(27) inf inf vilueW^: inf \v'^{u + e)\>b] \ > - 

w(iS<i e(^B{0,M) \ v(^Siw,j) ^ J 2 
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where B[0, M) := G : \\y\\ < M} denotes a centred ball of radius M. Hereafter, 
fix 7, 6 > such that (1271) holds and let a := 6/2. 
Fix a unit vector w E S'^ and consider the set 



A 



inf {\v'^Gn+i\^ + \v^iUn+2 + AG„+i + E„,+^] 



c 



c 



inf 
inf 

ii65(to,7) : \v'^Gn+i\<a 



)\<a 



\v {Un+2 + En+i)\ - X\v Gn+i\ < a 



C\ inf \v'{Un+2 + Er,+i)\<2a\. 

Since Un+2 is independent of J-'n+i, and since En+i is jF„_,_i-measurable, one may 
estimate 



¥(a^ 



eeB(0,M) 



inf P inf \v'^ {Un+2 + e)\ > 2a 



f £cS(io,7) 



^ n+1 



T 



inf u({ueR'^: inf 

eeB(0,M) \ ^ v£S{w, 



7) 



|f'^('U + e) 



by ( 1271) . almost surely, concluding the proof by fi := a^. 



□ 



Corollary 28. Assume it is bounded, stays bounded away from zero on compact sets, 
is differentiable on the tails, and has regular contours, that is. 
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lim inf 



X 



\\x\\^oo \\X\ 



Wn(x) 



< 0. 



Let {Xn, Mn, Sn)n>i bc an AM process as defined in Section\^ using a mixture proposal 
([5]) with a mixing weight satisfying [3 G (0, 1) and the density qfix is bounded away 
from zero in some neighbourhood of the origin. Moreover, suppose that the adaptation 
weights {rjn)n>2 satisfy Assumptions \T1\ andUU Then, 

lim inf inf w'^SnW > 0. 

Proof. In light of Theorem [26], it is sufficient to check Assumption [2H or in fact the 
conditions in Remark [251 Let L > be sufficiently large so that inf||a;||>i, ||fj| • ||^^|fy|| < 
0. Jarner and Hansen! (ill , proof of Theorem 4.3] show that there is an e' > and 
i^' > such that the cone 

E{x) := < X - au : < a < K, u e S''' , 



X 




U — 77 









is contained in the set A{x) := G M'^ : Tx{y) > 7r(a;)}, for all > L. 

Let r' > be sufficiently small to ensure that 'm.i\\z\\<ri (l?a{z) > 5' > 0. There is 
a r = r {e',K) G (0,r72) and measurable ^ : M'^ ^ M"^ such that - a;|| < r' /2 

and the ball B{x,r) := {y : \\y — ^{x)\\ < r} is contained in the cone E{x). Define 
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u{x) := Cj.^lB{o,r)ix) where Cr '■= |-B(0,r)| is the Lebesgue measure of 5(0, r), and 
let ^(x) := X for the remaining < L. Now, we have for ||x|| > L that 

/3gfix(a;-t/) min 1 1,^1^1 > p5'cru{y - 

Since vr is bounded and bounded away from zero on compact sets, the ratio 
7r{y)/7r{x) > S" > for all x, y G 5(0, L + r') with ||x — y\\ < r'. Therefore, for 
all < L, it holds that 

/3gfi.(x - min (l, 44 j > PS'S"cru{y -x). □ 

Remark 29. The conditions of Corollary [28] are fulfilled by many practical densities 
TT (see [m for examples), and are fairly easy to verify in practice. Assumption [2^ 
holds, however, more generally, excluding only densities with unbounded density or 
having irregular contours. 

Remark 30. It is not necessary for Theorem [26] and Corollary [28] to hold that the 
adaptive proposal densities {qs}s£C'i have the specific form discussed in Section [2] 
The results require only that a suitable fixed proposal component is used so that 
Assumption [21] holds. In Theorem [?T] below, however, the structure of {qs}sec'^ is 
required. 

Let us record the following ergodicity result, which is a counterpart to [17, Theorem 
17] formulating a a strong law of large numbers for the original algorithm (9ID^(9SD 
with the covariance parameter ([1]). 

Theorem 31. Suppose the target density vr is continuous and differentiable, stays 
bounded away from zero on compact sets and has super- exponentially decaying tails 
with regular contours, 

hmsup - — — ■ VlogTTfa;) = — oo and hmsup j — ^ ■ < 0, 

lIxlHoo \\x\\P M^oc ||V7r(a;)|| 

respectively, for some p > 1. 

Let {Xn, Mn, Sn)n>i bc an AM process as defined in Section [^ using a mixture 
proposal qs{z) = (1 — (3)qs{z) + Pqfix{z) where qs stands for a zero-mean Gaussian 
density with covariance s, the mixing weight satisfies [3 G (0, 1) and the density q^x 
is bounded away from zero in some neighbourhood of the origin. Moreover, suppose 
that the adaptation weights {rjn)n>2 satisfy AssumptionUU 

Then, for any function f : M.'^ ^ M. with sup^f^^dTT'^{x)\f{x)\ < oo for some 7 G 
(0,1/2), 



almost surely. 



f{x)n{x)dx 
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Proof. The conditions of Corollary [28l are satisfied, implying that for any e > there 
(e) > such that P( inf„ Amm('S'„) > k) > 1 — e where Amm(s) denotes the 
smallest eigenvalue of s. By |l7l . Proposition 18], there is a compact set C W^, a 
probability measure on C^, and < oo such that for all s E with Amin(s) > k, 
it holds that 

(29) P^y{x) < XsV{x) + blc^x), Va: G M'^ 

(30) PgX^,A)>6XA) Vxea 

where V{x) := (sup^ 7r(a;))^/^7r^^/^(x) > 1 and the constants Ag, 5s G (0, 1) satisfy the 
bound 

(31) (1 - A,)-^ V 6;^ < ci det(s)^/2 

for some constant Ci > 1. Likewise, there is a compact Df C M'^, a probability 
measure fif on Dj, and constants bf < oo and A/, 5/ G (0, 1), so that and (15UI) 
hold with Pj [n]. Theorem 4.3]. Put together, and hold for Pg^ for all s e 
with Amin(s) > K, perhaps with different constants, but satisfying a bound (I3T1) . with 
another C2 > Ci. 

The rest of the proof follows as in Theorem [23] by construction of an auxiliary 
process (X„, M„, S'„)„>i truncated so that for given e > 0, n < Ainin('S'„) < an'^ and 
\Mn\ < aTf and where the constant a = a{e, k) is chosen so that the truncated process 



coincides with the original AM process with probability > 1 — 2e. Theorem 2 of [17 
ensures that the strong law of large numbers holds for the constrained process, and 
letting e — > implies the claim. □ 

Remark 32. In the case rjn '■= n~^, Theorem [3T] implies that with probability one, 
Mn —>■ rrLj^ := J XTT{x)dx and Sn s^^ := J xx^TT{x)dx — mT^rri^, the true mean and 
covariance of vr, respectively. 

Remark 33. Theorem [21] holds also when using multivariate Student distributions 
{^sIsgC) [is') Proposition 26] extends [13, Proposition 18] to cover this case. 
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Appendix A. The Kolmogorov-Rogozin Inequality 
Define the concentration function Q{X; A) of a random variable X by 



for all A > 0. 

Theorem 34. Let Xi,X2, . . . be mutually independent random variables. There is a 
universal constant c > such that 



Q{X;X) := supP(X G [x,x + X\) 




for allL>X>0. 



Proof. Rogozin's original work 1^ 



uses combinatorial results, and Esseen's alterna- 
stic functions. □ 
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Appendix B. A Coupling Construction 

Theorem 35. Suppose fi and v are probability measures and the random variable 
X fi. Then, possibly by augmenting the probability space, there is another random 
variable Y such that F ~ z/ and F{X = Y) = 1 — \\n — i>\\ . 



Proof (adopted from Theorem 3 in fig /). Define the measure p := + z/, and the 
densities g := d/i/dp and h := dz//dp, existing by the Radon- Nikodym theorem. Let 
us introduce two auxihary variables U and Z independent of each other and X, whose 
existence is ensured by possible augmentation of the probability space. Then, Y is 
defined through 

Y = '^{U<r{X)}X + l{Uyr(X)}Z 

where the 'coupling probability' r is defined as r{y) := min{l, h{y) / g{y)} whenever 
g{y) > and r{y) := 1 otherwise. The variable U is uniformly distributed on [0, 1]. If 
r{y) = 1 for p-almost every y, then the choice of Z is irrelevant, p = v-, and the claim 
is trivial. Otherwise, the variable Z is distributed following the 'residual measure' ^ 
given as 

^^^^ /a max{0, h - g}dp 
J max{0, h — g}dp 

Observe that / max{0, h — g}dp = J max{0, g — h}dp > in this case, so ^ is a well 
defined probability measure. 
Let us check that F ~ z/, 



]P{YeA) = J rdp + ^{A) y(l-r)dp 

= [ mm{g,h}dp + ^{A) [ {g - h)dp 

J A •Jh<g 

= I minj^f, /i} + max{0, /i — 5'}p(dx) = z/(yl). 

J A 

Moreover, by observing that r{y) = 1 in the support of ^, one has 

P(X = Y)= I rdp= I min{g, h}dp = 1 - / {h - g)dp = I - \\v - p\ 

J J J a<h 



si^^c^ Ig<h(^ - 9)^P = Ih<g(9 - h)dp = snpj \J f{h - g)dp\ = - u\\ where the 
supremum taken over all measurable functions / taking values in [0, 1]. □ 

Appendix C. Proof of Lemma [22] 

Observe that without loss of generality it is sufficient to check the case m = and 
6=1, that is, consider the standard Laplace distribution tc(x) := |e~l^l. 
Let X > and start by writing 

1~ T// A = / a{x,y)qs{y - x)dy - b{x,y)qs{y - x)dy 

^ {^) J-x J\v\>x 
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where 



a(x,i/) := I 1 - I = 1-e '2''' and 



Compute then that 

fX i'2x px 

/ a{x,y)q,{y - x)dy - b{x,y)qs{y - x)dy = {l - e~^Yqs{z)dz. 

Jo Jx Jo 

The estimates 

fO px px 

a{x,y)qs{y - x)dy > qs{2x) a{x,y)dy = qs{2x) {l-e~^)dz 



b{x,y)qs{y - x)dy < qs{2x) / b{x,y)dy = qs{2x) / e 2(i-e2)dz 
due to the non-increasing property of qs yield 

a{x,y)qs{y - x)dy - / b{x,y)qs{y - x)dy 



1-e ^)dz - / e 2d2; 



00 



> 



> gs(2x) 

L"'o 

for any sufficiently large x > 0. Similarly, one obtains 

l*x 1*00 

- {1 - e~^Yqs{z)dz - b{x,y)qs{y - x)dy > 

^ ^0 J2x 

for large enough x > 0. 

Summing up, letting M > be sufficiently large, then for x > M and s > L > 



^ PsVjx) ^ 1 
V{x) - 



T P^ 1 /"Al 

- y (1 - e-'^YU^)dz > -UM) J (1 - e-i)'dz 



for some constants Ci,C2 > 0. The same inequality holds also for —x < —M due to 
symmetry. The simple bound PsV{x) < 2V{x) observed from (1521) with the above 
estimate establishes The minorisation inequality fl2^ holds since for all a; G C 
one may write 



Ps{x,A)> [ max (1,441 



Qsiy - x)dy 



> — mf qs{z-y) / dy > c^s ' v{A). 

SUp^7r(2;) s>L,z.y(^C J 
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where z^(^) := \A n C|/|C| with | • | denoting the Lebesgue measure. □ 
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